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Why Quality Assurance (QA)? 


The web can sometimes be challenging to archive. 

Quality assurance is important for: 

¢ checking for completeness and quality of archived sites 

% checking that the archived sites replay according to expectations 


% improving the quality of your capture 
If possible, don't wait too long to review and conduct QA! 
Live websites are constantly changing and being updated. If there is 


something missing from the crawl, you might not be able to fix it before 
the content changes or is removed. 
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Glossary 
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Crawl Report: A report generated with detailed information about what's 
inside your crawl. 


Host: Where web content is stored, designated by its Internet host name. 
E.g. https://archive-it.org/ (host bolded) 


Wayback QA Tool: An automated quality assurance tool for improving 
the quality of your captures. 


Patch Crawl: A crawl to capture and patch in documents that were not 
captured in your original crawl. 


Robots.txt: Files that a site owner can add to their site to keep crawlers 
from accessing all or parts of it. 
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Where is my Crawl Report? 


You can access your crawl reports at any time through: 


% The Crawls link in the top navigation bar 
% The Crawls tab within any given collection 


By default, crawl reports are 
listed by Crawl ID, which is a 
unique identifier displayed to 
the left of each crawl. 
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Crawl Overview 
Seeds Report 
Hosts Report 


File Types Report 


What's inside a Crawl Report? 
ARCHIVE-IT 


Crawl Overview 
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archive-it.org Е) 7 $e о У Е 


Help Center Welcome, system 
Collections Crawls Archives 
ARCHIVE-IT 


Home Collections Festivals Around the World Crawls 1623583 


Festivals Around the World 


Test Crawl: 1623583 © | Started: June 9, 2022 9:18 AM Mountain Daylight Saving Time | Completed: June 10, 2022 2:47 PM Mountain Daylight Saving Time 


Ө Time left to save your crawl data: 48 Days / 3 Hours / 10 Minutes 
After this time, your crawl will be deleted. (This report will remain accessible.) 


Save Crawl Data Delete Crawl Data 


Seeds Hosts File Types 


Total Data otal Doc New Do 
Q 1.9 GB [a] 34,213 ©) 20,763 
#1623583 
Collection: Festivals Around the World Crawl Frequency: Test [8)(0) Notes 
Started: June 9, 2022 9:18 AM Mountain Scheduled By: system 
Daylight Saving Time Next Crawl: N/A 
Completed: June 10, 2022 2:47 PM Test Crawl Data: Scheduled for Deletion 
Mountain Daylight Saving Time 1.9 GB 


Crawling Technology: Brozzler Capture Data 
Finished Document Rate Data Nate New Data vs Duplicate Data 
0.00/s 24.86 KB/s 
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Many details about the crawl, 
including 


e New Data 
e Crawl Status 


Test Crawls have options 
to Save or Delete Crawl Data 
in the blue area 


REVIEW: Seeds Report 
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Т ; Seeds Report list the Seed URLs 
Home Collections Crawls Archives ARS exiis included in the Crawl. 


Festivals Around the World 


Test Crawl: 1623583 © | Started: June 9, 2022 9:18 AM Mountain Daylight Saving Time | Completed: June 10, 2022 2:47 PM Mountain Daylight Saving Time 


For each seed, it will also list 


Seed Type 

Seed Status 

Docs & Data collected 
Seed link to Seed 
Management Interface 
e Wayback link 


+ Download Seed List 


Seed URL Seed Type Seed Status Docs New Docs Data New Data Seed Wayback Link 


Standard 34,213 20,763 1.9GB 166 MB Seed > Wayback > 


https://twitter.com/RioCarnaval/ Crawled 
—— 


Scoping Rules for this Craw 


» Crawl Limits 


» Collection Scope Rules 


erue Clicking directly on a Seed URL 
itself will open the Seed's 
Hosts Report 


34213 20,763 19GB 166MB Seed» Wayback > 


video.twimg.com 
abs.twimg.com 
pbs.twimg.com 
twitter.com 

* Crawl Limits 


» Collection Scope Rules 


+ Seed Scope Rules 


Festivals Around the World 


REVIEW: Seeds' Hosts Report 


Test Crawl: 1623583 © | Started: June 9, 2022 9:18 AM Mountain Daylight Saving Time | Completed: june 10, 2022 2:47 PM Mountain Daylight Saving Time 


x Download Host List 


New Data Blocked 


1.6 GB 


180.6 MB 


94 MB 


21.4 MB 


Q 
1.6 GB 91MB 0 
180.6 МВ 10.4 MB o 0 [] 
94 МВ 3MB 0 0 о 
21.4 МВ 10.7 МВ o 0 193 


Queued 


Out of Scope 
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Lists each host from which 
content was discovered for the 
Seed URL on the live web 


Also lists for each host: 
e Docs collected 
Data totals 
Blocked documents 
Queued documents 
Out of Scope documents 


Click directly on the Hosts' links 
to see lists of documents from 
a host 


Click directly on numbers to 
see lists of documents 
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Archive-it: Hosts for Crawl #16 


IA Calendar. ; 
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chive-it.org 


AIT Public Ө ArTPartner (Al zo Дїнс {С Timebox 2022 


Collections 


Archives 


ARS 


Festivals Around the World 


Test Crawl; 1630947 © 


pbs.twimg.com 
abs.twimg.com 
video.twimg.com 


screenshot: 


thumbnail: 


api.twitter.com 
youtube-d 
abs-0.twimg.com 
syndication.twitter.com 
analytics.twitter.com 
t.co 
support.twitter.com 
business.twitter.com 
dev.twitter.com 
help.twitter.com 
legal.twitter.com 


status.twitterstat.us 


140 


77 


New Docs 


60 


B 


Chat © Django @ Rules Engine 


17.1 MB 


6.2 MB 


2.6 MB 


7.1 KB 


5B 


4.8 KB 


446 bytes 


O bytes 


Obytes 


0 bytes 


0 bytes 


0 bytes 


0 bytes 


REVIEW: Hosts Report 


New Data 


4.8 KB 


446 bytes 


0 bytes 


0 bytes 


0 bytes 


0 bytes 


0 bytes 


0 bytes 


JIRA TriNet Platform 


|. Started: June 22, 2022 2:12 PM Mountain Daylight Saving Time | Completed: June 22, 2022 6:20 PM Mountain Daylight Saving Time 


Blocked Queued 
o 0 
о 0 
o 0 
o 0 
4 41 
o 0 
o 0 
o 0 
o 0 
o 0 
o 0 
2 0 
o 0 
o 0 
0 0 
o 0 
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о 0 


ом єп 
fü Internet Archive: W.. >> Clother Bookmarks 


Help Center Welcome, system 


Out of Scope 
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Hosts Reports list all hosts 
through which content was 
discovered during the 
crawl 

e Docs and New Docs 

e Data and New Data 

e Blocked Documents 

e Queued Documents 

e Out of Scope 


Documents 
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Help Center Welcome, system 
Home Collections Crawls Archives ARS 
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Festivals Around the World 


Test Crawl; 1623583 Т | Started: June 9, 2022 9:18 AM Mountain Daylight Saving Time | Completed: June 10, 2022 2:47 PM Mountain Daylight Saving Time 


(z) 1.9 GB Q 166 MB © 34,213 © 20,763 © 13,450 


3. Download List of File Types 


File Type New Docs New Data 
video/mp4 2,372 79 1.6 GB 90.9 MB 
application/javascript 5,960 357 181.7 MB 12.4 MB 
image/jpeg 112.6 MB 


application/json 16.7 MB 


» Crawl Limits 


* Collection Scope Rules 


» Seed Scope Rules 


& Download List of File Types 1 


video/mp4 1.6 GB 


application/javascript 5,960 357 1817 MB 124MB 


image/jpeg 2,484 426 112.6 MB 26.7 MB 
@ нер 
application/json 1,940 1,445 167 MB 64MB 


REVIEW: File Types Report 
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The File Types report lists 
documents collected by 
the kind of files they are. 


The File Type itself lists the 
kind of content and then 
the file extensions of 
documents, 

e.g. video/mp4 
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Review demo 
ARCHIVE-IT 


| will show you how to: 


% Find your crawl report 
% Read your crawl report 
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Browse Wayback pages 
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Check replay of your archived pages in Wayback. 
Compare their look and feel with the live web's pages. 
Ask yourself: 


% Does it meet expectations? 
% Will it serve future users' needs? 


Try some QA strategies if not! 


INTERNE 


Look for the same features as the Live Web's counterpart website: 


Browse Wayback pages 


Styling features 
headers/footers 
grids/boxes 


CSS 
Interactive features js (javascript) 


dropdown menus 
carousels/sliders 


jpg. jpeg, -png 


Embedded elements | 
| .mp4, .mp2t, .mpeg (video) 
mages .mp3, .mpeg (audio) 
videos 
audio files 


Working links 
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Look for the same features 


VANCOUVER 
PUBLIC 
LIBRARY 


Hours & Locations Borrowing v 


Indigenous 
Storyteller 


in Residence 


«© 


SHARE 


Digital Library v 


SEDACIE 


Accessibility Translate 


Booking & Facilities v Programs & Eveni 


Indigenous Storyteller in 
Residence 


The VPL Indigenous Storyteller in Residence 
program was created in 2008 in recognition of 
Indigenous Peoples in Canada and the importance 
of oral storytelling as a way to learn about and from 
Indigenous communities in Vancouver. Through the 
storyteller in residence, we seek to honour 
Indigenous cultures and to promote intercultural 
understanding and communication between 
Indigenous and non-Indigenous peoples. 


It was the first program of its kind at a public library 
in British Columbia and only the second at a 
Canadian public library. The Indigenous Storyteller 
in Residence program has now become essential to 
the library's public programming. 


Browse Wayback pages 


as the Live Web's counterpart 


bility ^ Translate 


0000 8 


n / My VPL 


My VPL Hours & Locations Borrowing 


Digital Lib 


Programs & Events 


SEARCH THE ЖТ М 


CONTACT US 


X. 604.331.3687 
к programs@vpl.ca 


RECOMMENDED READS 


Indigenous Storyteller in 


3 Residence 
Indigenous 


Storyteller 
in Residence 


e 


CONTACT US 


М. 604.331.3687 
ЮЗ programs@vpl.ca 


RECOMMENDED READS 


The VPL Indigenous Storyteller in Residence 
program was created in 2008 in recognition of 
Indigenous Peoples in Canada and the importance 
of oral storytelling as a way to learn about and from 
Indigenous communities in Vancouver. Through the 
storyteller in residence, we seek to honour 
Indigenous cultures and to promote intercultural 
understanding and communication between 
Indigenous and non-Indigenous peoples. 


It was the first program of its kind at a public library 
in British Columbia and only the second at a 
Canadian public library. The Indigenous Storyteller 
in Residence program has now becorne essential to 
the library's public programming. 


Highway of Tears 


21 Things You May 


the Indian Act 


Kung Jaadee 
2021 Indigenous Storyteller in Residence 
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PATCH CRAWLS 
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A patch crawl is a crawl to capture and patch in documents 
that were not captured in your original crawl. 


Two ways to run patch crawls: 


% Via the Hosts Report 
% Via the Wayback QA tool 
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PATCH CRAWL 


via the Hosts Report 
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Hosts Report's Blocked Documents 


Q 9 9 А achete oss forc x + 


є Œ ё partner.archive-it.org/2031/collections/16622/craw|/1412828/hosts 


Archives 


QA May 13 


One-Time Crawl: 1412828 | Started:May7, 2021 531 PM Mountain Daylight Time | Completed: May 8, 2021 5:33 AM Mountain Daylight Time 


Host List 


Type to Filter Hosts 


I E 


4 Download Host List Е 


=> е5 "ICA з un 


О 15~sn-ndvrsne7 googlevideo.com 2 1 srame 5әзмв ; 
О rs-enasmeknzrgooglevideo com m 155MB 155мв o о 1 
О codejquery.com. в 4 16M8  8322KB 0 o 14 
O myoutube.com 4 4 562.2 KB 562.2 KB. o o 19 
О s7addthiscom ВЕБ assaka зыка о о 7 
О www youtube.com Ей 7. 161.7 KB. 161.7 KB. о o 30 
O  www.googletagmanager.com 4 4 113,8 KB 113.8 KB ° o 5 
0 services arcisonline.com 4 31 sse ase o o 2 
О  wwwesricom TENES 405K8 — 405K8 о о E 

winigocgie com 5 m wsk sae о o 15 
О wwwgoogleca jo тва зв л o 2 

whois 2 2 ase aske о о 2 
О пвп вооріейбео com "m se 12а o o 3 
O fonsgoogeapiscom "mum 26k оза о о 5 
07 seript.crazyegg.com 5 1 24KB 526bytes 0 o 2 
D ans u i 238 оза o o o 

usetypekitnet 2 2 зака ав о о 5 
0 piypekitnet 2 2 646bytes 646bytes 0 o 1 


Start a Patch Crawl for Blocked Documents 


€ 9 9 А месна сами x + о 


€ > С & partner.archive-it-org/2031/collections/18622/craw/1412828 hosts ot ©: 


Run Patch Crawl (1 Host Selected) 


This patch craw wil capture URLs from specific hosts that were blocked by robots.txt, if the giore Robots.txt check box is selected, Once 
started, the patch craw can be monitored from the Current Crawls area of your account. Опсе it has finished, a fll et of reports will be. 
generated. 
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Filter Selected Hosts by Keyword а 


wwwwplca — 2902 1963 2888MB — 2305MB 
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The Wayback QA tool 


0 GO АД Archive-tt: Hosts for Crawl #4 х 


€ > Œ d waybackarchive-itorg/1662 


M Vancouver PublicLibrary] x + o | А Archive-It; Hosts 


2/20210508010845/http:|/www.vpl.ca/ x © 


Ouidio x бй 


neouver Public Library) х + 


ё wayback archive-it.org/16622/20210508010845/http://www.vpl.ca/ * 9 


You are viewing an archived web page, collected at the request of Community Webs Training using Archive: 
collection. The information on this web page may be out of date. See All versions of this archived page. Found 0 archived media items out of 1 total on this page. Metadata 


You are viewing an archived web page, collected at the request of Community Webs Training using Archive-It. This page was captured on 1:08:45 May 08, 2021, and is part of the ОА May 13 
collection. The information on this web page may be out of date. See All versions of this archived page. Found 1 archived media item out of 1 total on this page. (3 Metadata 
Disable СА View Missing Documents (13 Detected) 


This page was captured on 1:08:45 May 08, 2021, and is part of the СА Мау 13 не 


Enable QA 


VANCOUVER 
PUBLIC 
LIBRARY 


Hours & Locations Borrowing v 


Digital Library ~ 


м 


ility ^ Translate @ @ @@ Log In / My VPL 


VANCOUVER \ \ 
PUBLIC Accessibility ^ Translate Log In / My VPL 
P| E == 0000 
Booking & Facilities Programs & Events v Ask Us v Hours & Locations Borrowing « Digital Library ~ — Booking&Fdtiliies « Programs & Events AskUs ~ 


Scans for missing documents on the Wayback page 
Allows patch crawls for missing documents 


Used for Production Captures (Not Test Captures 
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In the partner account application 


С ё partner.archive-it.org/2031/collections/16622/aa?pag: 


Patch Crawl with Wayback QA 


А Archive-tt: Hosts for Стам! arii X | @ Vancouver Publie Library | 


Collections 


Home | Callctions / 16822 / QA 


QA May 13 + 


егег Crawls 


Missing Documents 


Missing Document List 


http://www.ypl.ca/ 


a 


Nipifscripterazyegg com/pages/scripts 
Fiter Match: "http Zw vplcar 
Ü)ontprseriptcrazyegg.com/pagesiscripts. 
Fiter Match: "http www vpl.ca 
00 napirscripcrazyegg.com/pages/scripts 
Fiter Match: "httpillwwwvpl.ca 
0 httpir/seript.crazyegg com/pages/scripts. 
Fiter Match: "http://www. vpl.car 
C napitscrptcrazyegg.com/pages/scripts 


Filter Match: "http://www. vpLca/ 


hetp:/twwwyplea/misefmenu-expanded 


Fiter Match: "htpi//www.vplea/misc/menu. 


http//www ypl.ca/misefmenu-expanded 


Filter Match: "httpi//www.vpl.ca/misc/menu. 


hitpsirmg com/v/ooMIBZCqz2)o/max. 


Нег Match: "http/www.vplca 


X А Arhive-tt Collection #16622 x + 


ttp: 42F%42Fwww.vpl.ca%2F 


Collection 


Ignored Documents Patch Crawl 


hg. vpl.ca 


htpi/vwiw.vpL.ca/ 


рән ур\са/ 


рууну vpk.cal 


ҺарУЛиммурісәг 


hetpz/veiw vp cal 


hutps:/wwew.ypl.ca/storyteller 


hrtpi/vewew plica 


Created: May 6, 2021 by system Updated: May 6, 2021 by system 


4 Download Wayback QA List 


Invalid (Access Denied) Obytes Мау, 2021 
Invalid (Access Denied) Obytes May, 2021 
invalid (Access Denied) Obytes Mays, 2021 
Invalid (Access Denied) Obytes May, 2021 
Invalid (Access Denied) Obytes MayB,2021 
Not Crawled image/png J06byte мау, 2021 
Not Crawled image/png 106byte Mays, 2021 
Not Crawled imagejpeg 933KB May8. 2021 


А Arcrivest Hosts for Cri i^ x | Vancouver Pubic Library | 


х А Archive-tt:-Collection #16622 x + 


ё partner.archive-it.org/2031/collections/16622/qa?page=http: %2F%2Fwww.vpl.ca%2F 


Run Patch Crawl 


Please select options below for a рап ‚д ог the selected 


missing documents. 


ө 


Ignore Robots.txt capture documents blocked 


by robots.txt 
З selected Docume: be crawled. 
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QUALITY ASSURANCE 
Use the Wayback QA tool ARCHIVE-IT 


INTERNE 


Detects 
missing 
documents 
on the page 
and allows 
you to patch 
crawl them 
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https://www.halifaxpubliclibraries.ca 
/ 


€ > с а Һаман 


Subpage URL Q Hours & Locations» Ө Helpy 


Search Q 


Halifax Public 


Libraries 


Catalogue — Browse~ E-Library — fiWharsOn Using the Library Supporting the Library ~ 


HALIFAX Blogs > BeyondBooks 
How To: Get Started with OverDrive Magazines 
Categorias 


Moll 
Business & Careers 
Toons 


More Posts 
Read E-Magazines on 


OverDrive 


Quiz: Have These Unlikely Crimo- 
Fighting Duos Met Their Match? » 


AME 


Halifax Municipal Archives: Finding 
the Spanish Flu in Archival 


In the Kitchen: Teens Recipes > 
Мел шл Our selection of ever 3,000 magazines can't walt for you to discover your favourite, and read dial for free 


You can now find RBdltals e-magazinos on OverDrive and Libby, You can learn more about the move from RBdigital to OverDrive 
bere 


Kaap up on the latest celebrity gossip or leam about word affairs with our magazines оп OverDrive, 


HELPER SEEDS 
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Copy subpage URL 


€ C | à https;//www.halifaxpubliclibraries.ca/blogs/post/how-to-get-started-with-overdrive-magazines/ = х) Ө : 


9 Hours & Locations v Ө Help» Log In / Му Account v 


Halifax Public 
Libraries 


Catalogue s Add U RL..as its. own seed... 


Search Q 


Add Seeds 


Enter one seed URL per line below to add them to this collection. 


© {publiclibraries.ca/blogs/post/how-to-get-started-with-overdrive-magazines. ed 


Overview 
Access Private 
Seed List 


Frequency: One-Time 


Add Seeds ast Crawl Captures Wayback 


Wayback > 


Type to Filter Seeds 
& Download Seed L 


Seed Type: 


Add Seeds 


Manage Seed Groups 


Run Crawl Edit Settings 


B- seed URL Cancel 


https://www. halifaxpu 


Wayback > 


One-Time Standard Public 


https://www. halifaxpubliclibraries.ca/ Active Mar11,2021 1 
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Open Wayback pages in NEW TABS 
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— Á— — c 
ié С à wayback.archive-it.ora/16147/20210311203040/https://vww.halifaxpubliclibraries.ca/ *6: 


You are viewing an archived web page, collected at the request of Archive-It Demo: 2021 Archive-It. This page was captured on 20:30:40 Mar 11, 2021, and is part of the Review and QA юе 
collection. The information on this web page may be out of date. See All versions of this archived page. Found 0 archived media items out of 0 total on this page. Metadata 
Enable QA 


Log In / My Account 


Liz 4 Right-click ш. 


Catalogue [ies 


sn Link in New Tab On» Using the Library v 
‘Open Link in New Window 

‘Open Link in Incognito Window 

‘Save Link AS.. 

Copy Link Address 


Inspect 


Useful for opening 
LJ Video Watchpages from a YouTube Channel 
LJ Links built with Javascript 
Ы Subpage URLs with a #: 
https://www. halifaxpubliclibraries. ca/#browse_ menu 


*9: 
You are viewing an archived web page, collected at the request of Archive-It Demo: 2021 using Archive-It. This page was captured on 3:00:24 Mar 12, 2021, and is part of the Reviewand QA на 
collection. The information on this web page may be out of date. See All versions of this archived page. Found 0 archived media items out of 0 total on this page. Metadata 


Disable QA Уем Missing Documents (0 Detected) 
9 Hours В Locations» Ө Help E _ 
ГЕ Halifax Public a 
Libraries Semen 


Catalogue Browsev E-Libraryv — f$ What's On» Using the Library v 


@ waybackarchive-it.org/16147]. 


10312030024/https://www-halifaxpubliclibraries.ca/adults/ 


Get a Library Card 


Literacy Help & Upgrading Free Tax Clinics 
Book Club Kits 


Complete your tax return at a free tax Get Your 
clinic Free Library 


Proctoring 


Free Tax Clinics 


Card Today! 


Learn more 
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Quality Assurance demo 


| will show you how to: 


% Load and look over the Wayback page 
¢ Use the Wayback QA Tool to patch crawl 
% Patch Crawl via the Hosts Report 


ARCHIVE-IT 
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QUALITY ASSURANCE 
== Leverage the Help Center 


Review the documentation in our Help Center anytime: 


Scoping guidance for specific kinds of sites 


e What is the usual scoping? 
e What is the expected replay? 
System Status/Social media and other platforms status 


e How are systems and platforms currently performing? 


Known Web Archiving Challenges 
e Is this content archivable? 


Troubleshooting Browser Issues 
e Is my browser interfering with replay? 


What to do when you see a blank page in Wayback 
e Havel tried Brozzler? 


e |5 JavaScript interfering? 
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QA CHECKLIST as 
Prioritize: Audiovisual & dynamic content; significant properties 

Check crawl and seed status, data, and docs. 

Replay and browse with Wayback. 

Find missing documents: blocked, queued, and “out of scope.” 

Patch crawl missing documents for prioritized pages. 

Leverage the Help Center, particularly System Status page. 

When in need, ask for help © 
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Photo courtesy of Towfiqu Barbhuiya on Unsplash 


LEARN MORE 
ARCHIVE-IT 


Help Center: https://support.archive-it.org/hc/en-us 
Check out our blog: www.archive-it.org/blog 


Thank you! Any questions? 
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YouTube CS 


Watch pages / embedded videos 
Channels / users/ playlists A 
o Videos may need to be opened in new browser tabs in order to replay 
o Videos from playlists may not replay in Wayback 
Expectations 
o Replays most reliably in Chrome 
o Videos will replay in-page and via "Videos" link in the Wayback banner 
Troubleshooting 
o Did Iuse Brozzler? 
o yt-dlp: did it run? (look for host “youtube-dl” in Hosts Report) 


o File Types Report: were video files collected? 
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youtube-dl / yt-dlp 


ARCHIVE-IT 


e open-source command-line utility for retrieving media which 
enables the preservation of full, discrete files (ie. MP4, MP3, MKV, 


etc.) for access 


e Runs on web pages during the crawling process and deposits media 


items it finds into WARC files with corresponding JSON metadata 


e When it runs as expected: 


o Wayback can load information from its corresponding JSON 


metadata files into the banner message 
o Specify how many media items were archived 


o provide direct access to media via a media player 
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T youtube-dl 


= ARCHIVE-IT 


You are viewing an archived web page, collected at ше request of Karl- Rainer Blumenthal using Archive-It. This page was captured on 16:46:34 hide 
Feb 04, 2020, and is part of the De date. See All versions of this archived page. 


Found 2 archived media items oui of 2 total on this page. Э Metadata 


m support.google.com 0 0 0 bytes 0 bytes 
m rr3---sn-0097znss.googlevideo.com 1 1 1.2 KB 1.2 KB 

a r1—sn-h5qzen7y.googlevideo.com 1 1 5.5 KB 5.5 KB 

a rr3—-sn-0097znze.googlevideo.com 1 1 193 MB 193 MB 
О  itytimg.com 2 2 85.5 KB 85.5 KB 
m r1-—sn-0097znze.googlevideo.com 2 2 646 bytes 646 bytes 
m accounts.google.com 3 3 1.5 MB 1.5 MB 
m static.doubleclick.net 3 3 2.5 KB 2.5 KB 

О youtube.com 3 3 3.8 KB 3.8 KB 

m play.google.com 3 3 2.6 KB 2.6 KB 
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Additional Resource Roundup 


Archive-It System and Platform Status page 
Help Center articles for specific platforms 
What to do when you see a blank page 

Guide to A/V web archiving with youtube-dl 
How to use the Wayback QA tool 

How to clear your cache, cookies, and history 
Upload WARCs 


The Wayback Machine 
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